<html>
<head><title>11-712 NLP Lab</title></head>
<body bgcolor="#FFFFFF">
<h1 align=center>Language Technologies Institute<br>11-712: Self-Paced Laboratory</h1>

<hr>

<h1>Algorithms for NLP:<br>
GLR Module Design Specs</h1>

<h2>Goal</h2> 
<p>Write a small parsing system that can handle simple declarative
sentences, simple NPs, and prepositional phrases, using simple
semantic restrictions to block attachment of PPs that aren't licensed
by a basic semantic lexicon.</p>

<h2>Syntactic Lexicon</h2>
<p>The syntactic lexicon should contain lines with the following form:</p>
<p align=center><code>("word" (feature value)+)</code></p>
<p>Look at the <a href="glr-lexicon.lisp">given syntactic lexicon</a> for an example.</p>
<p>The syntactic lexicon should encode the following slots:</p>
<p>
<ul>
<li><b>Nouns:</b> cat (n), number (sg or pl, for irregular forms), sem (see below)
<li><b>Verbs:</b> cat (v), valency (trans or intrans), sem (see below)
<li><b>Prepositions:</b> cat (p), semrole (see below)
<li><b>Determiners:</b> cat (det), reference (definite or indefinite), number (sg or pl)
</ul>
</p>

<p>The value of the <code>sem</code> feature is a symbol corresponding
to the word's entry in the semantic lexicon. It is comprised of a
prefix (*A- for verbs, *O- for nouns), a symbol denoting the root form
of the word (e.g., SEE), and an index to differentiate the particular
meaning of the (root, pos) pair (positive integers, starting at
1). For example, to encode both the transitive and intransitive
meanings of "see", we would use *A-SEE-1 and *A-SEE-2).</p>

<p>The value of the <code>semrole</code> feature is a symbol
corresponding to the relation (slotname) denoted by the prepositions
in the semantic lexicon. It is comprised of a prefix (+) and the name
of the relation; e.g., for "in" we might encode the semantic role as
+LOCATION.</p>

<p>Since part of your assignment is to re-use the Tomita morphology
code and combine it with lexical lookup to inflect lexical entries for
number, you shouldn't have to include the <code>number</code> feature
unless you are encoding an irregular form not handled by the
morphology code (this piece of the assignment is described in more
detail later on).</p>

<h2>Semantic Lexicon</h2>

<p>The semantic lexicon should contain entries with the following form:</p>
<p align=center><code>(frame-name (slot-name slot-value)+)</code></p>
<p>Look at the <a href="glr-semantics.lisp">given semantic lexicon</a> for an example.</p>
<p>The semantic lexicon should encode the following features:</p>
<p>
<ul>
<li><b>Objects (Nouns):</b> =is-a (class), semroles
<li><b>Actions (Verbs):</b> =is-a (class), semroles
</ul>
</p>

<p>Lexical concepts (those which appear in the <code>sem</code>
feature in the syntactic lexicon) are always prefixed with
<code>*A-</code> or <code>*O-</code>, and appear as the first element
in the semantic lexicon entries.<p>

<p>Inheritance (e.g., IS-A) links are always prefixed with '='. For
this assignment, you only need to include <code>=IS-A</code>. These
links appear inside the semantic lexicon entry as a list, where the
first element is <code>=IS-A</code> and the second element is the
parent class.  </p>

<p>Classes are always prefixed with '&', and appear as fillers in
<code>=IS-A</code> or semantic role slots.

<p>Semantic role names (semroles) are always prefixed with
'+'. Semantic roles appear in semantic lexicon entries as lists, where
the first element is the role name and the second element denotes the
class restricting the set of legal fillers for the role.</p>

<h2>Coverage</h2>

<p>You will need to write syntactic lexicon entries to handle the word
occurrences in these sentences:
<pre>
a man
the man
the men
the boy
the boys
the man sees
the man sees the boy
the man sees the boy with the telescope
the man sees the boy with the dog
</pre></p>

<p>You will need to write semantic lexicon entries to handle these concepts:
<pre>
*A-SEE-1
*A-SEE-2
&O_ANIMATE
*O-MAN-1
*O-BOY-1
*O-DOG-1
*O-TELESCOPE-1
</pre></p>

<p>You will need to encode these semantic roles:
<pre>
*A-SEE-1, *A-SEE-2: (+INSTRUMENT &OPTICAL_INSTRUMENT)
&O_ANIMATE: (+ACCOMPLICE &O_ANIMATE)
</pre></p>

<p>Your semantic frames should model this hierarchy fragment:</p>

<img align=center src="hierarchy.gif">

<h2>Loading the Lexicons</h2>

<p>In the <a href="given-code.lisp">given code file</a>, you will find
functions <code>load-lexicon</code> and <code>load-semantics</code>,
which you can use to load your completed lexicons into Lisp.</p>

<h2>Syntactic Grammar</h2>

<p>The grammar should include rules for the following constructions:
<pre>
&lt;start&gt; &lt;==&gt; (&lt;np&gt;)
&lt;start&gt; &lt;==&gt; (&lt;vp&gt;)
&lt;start&gt; &lt;==&gt; (&lt;np&gt; &lt;vp&gt;)
&lt;np&gt; &lt;==&gt; (&lt;np&gt; &lt;pp&gt;)
&lt;np&gt; &lt;==&gt; (&lt;det&gt; &lt;n&gt;)
&lt;np&gt; &lt;==&gt; (&lt;n&gt;)
&lt;vp&gt; &lt;==&gt; (&lt;vp&gt; &lt;pp&gt;)
&lt;vp&gt; &lt;==&gt; (&lt;v&gt; &lt;np&gt;)
&lt;vp&gt; &lt;==&gt; (&lt;v&gt;)
&lt;pp&gt; &lt;==&gt; (&lt;p&gt; &lt;np&gt;)
</pre></p>

<h2>Lexical Lookup, Morphological Inflection</h2>

<p>Your grammar <i>should not</i> use lexical rules inside the
grammar; instead, you should use the Tomita "wildcard" rule syntax,
and write a Lisp callout function to read in lexical items:
<pre>
&lt;n&gt; &lt;-- (%)
&lt;v&gt; &lt;-- (%)
&lt;det&gt; &lt;-- (%)
&lt;p&gt; &lt;-- (%)
</pre></p>

<p>The form of each rule should be like this:
<pre>
(&lt;n&gt; &lt;-- (%)
     ((x0 &lt;= (parse-eng-word (string-downcase (symbol-name (x1 value)))))
      ((x0 cat) = n)))
</pre></p>

You should write a function called <code>parse-eng-word</code>, which
performs morphology on its string argument, and returns the inflected
lexical f-structure for the word. This should be done in three
steps:
<ol>
<li>Use the built-in function <code>parse-eng-morph</code>
to return the set of <code>("root" morph)</code> pairs that are possible for the word;</li><p>
<li> Look up each root form in the lexicon to see if it exists;</li><p>
<li> If a morpheme was found attached to the root, inflect any lexical entries appropriately. Write a function called <code>inflect-lex</code> to inflect nouns and verbs, as follows:
<pre>
INFLECT-LEX

Assigns agreement features for N and V, depending on presence or
absence of +S morpheme and/or explicit lexical features:

 N: 
   - Defaults to (PERSON 3), unless feature supplied by lexicon
   - Defaults to (NUMBER SG), unless:
            * feature supplied by lexicon
            * +S is present -> (NUMBER PL)
 V: 
   - If +S present, (PERSON 3), else will unify with any SUBJ
     (functionally the same as (*OR* 1 2 3)
   - Defaults to (NUMBER PL), unless:
            * feature supplied by the lexicon
            * +S is present -> (NUMBER SG)
</pre></li><p>
</ol>
 (Hint: you should study the data
structure provided by the <code>load-lexicon</code> function, so you
can retrieve the uninflected lexical items from the lexicon using
<code>gethash</code>).

<h2>Compiling and Loading the Grammar</h2>

<p>You should use the <code>compgra</code> function to compile and
load the grammar (see the example in the <a
href="given-code.lisp">given code file</a>). You will need to
recompile your grammar with <code>compgra</code> each time you make a
change to the grammar before you will be able to test the change.</p>

<h2>Semantic Restrictions on PP Attachment</h2>

<p>Once you have your grammar working, you should add Lisp callouts to
the rules which attach PPs to NP and VP, in order to implement
semantic restrictions.</p>

<p>The function <code>semrole-filler-match</code>, provided in the <a
href="given-code.lisp">given code file</a>, will do most of the work
for you -- its arguments are the semantic lexicon entry for the head
(NP or VP), the semrole (from the P's syntactic lexicon entry), and
the semantic lexicon entry for the filler (the PP object). This
function will return T or NIL depending on whether the semantic
lexicon contains information that licenses the given attachment, using
some inheritance methods defined in the function.</p>

<p>In order to use this function, you will have to write a grammar
callout, called <code>license-attachment</code>, which takes two
arguments: the f-structure for the head (NP or VP) and the f-structure
for the filler (PP), extracts appropriate information from the
f-structure(s) and/or the semantic lexicon, calls
<code>semrole-filler-match</code>, and returns the appropriate new
f-structure (head with PP attached) or NIL depending on the result of
the call to <code>semrole-filler-match</code>.</p>

<p>Your code for <code>license-attachment</code> should print a trace
message signalling the result of each call; see the examples
(mentioned below) for the format of the messages.</p>

<h2>Examples</h2>

<p>When you're all done, you should get outputs like those shown in
this <a href="test-output.txt">set of examples</a>, assuming you've got
all the parts right. (See the <a href="instructions.html">instructions</a> on how to run the testing function, <code>(run-tests)</code>.</p>

<hr>

<i>5-Nov-96 by <a href="mailto:ehn@cs.cmu.edu">ehn@cs.cmu.edu</a></i>

</body>
</html>